Skip to content

Handle PlanarConfiguration=2 across CPU + GPU multi-band reads#1532

Merged
brendancol merged 3 commits intoxarray-contrib:mainfrom
brendancol:fix/planar-config-multiband
May 9, 2026
Merged

Handle PlanarConfiguration=2 across CPU + GPU multi-band reads#1532
brendancol merged 3 commits intoxarray-contrib:mainfrom
brendancol:fix/planar-config-multiband

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

Two related multi-band accuracy bugs from the geotiff audit:

A2 (HIGH): GPU tiled reads of planar=2 files returned wrong pixel values.
gpu_decode_tiles / _assemble_tiles_kernel iterate a single TileOffsets
list with bytes_per_pixel = itemsize * samples and assume chunky interleave.
For PlanarConfiguration=2, each band has its own contiguous run of tiles in
TileOffsets, so the kernel read across band boundaries.

A3 (MEDIUM): The GPU stripped multi-band fallback hardcoded dims=['y','x']
on a 3-D CPU read, the same regression that PR #1525 addresses for the stripped
path. Folded in here so the planar=2 stripped tests pass without depending on
PR #1525 merging first.

Shared root cause: neither path branched on ifd.planar_config.

Fix

  • read_geotiff_gpu: when ifd.planar_config == 2 and samples > 1, slice
    TileOffsets / TileByteCounts per band, decode each slab as a single-band
    tile sequence, then cupy.stack(..., axis=2) into (H, W, samples). Chunky
    (planar=1) goes through the existing single-pass kernel unchanged.
  • read_geotiff_gpu stripped fallback: pick ('y', 'x', 'band') with a band
    coord when the CPU array is 3-D (mirrors PR Fix read_geotiff_gpu dim mismatch on stripped multi-band TIFFs #1525).
  • Both multi-band paths now assert the assembled shape equals
    (H, W, samples) before returning.

Diff is scoped to xrspatial/geotiff/__init__.py -- the GPU kernels are not
touched, so other GPU codecs and the chunky/predictor paths see no behaviour
change.

Paths that now diverge correctly

Layout Planar Backend Path
Tiled 1 GPU unchanged: single gpu_decode_tiles
Tiled 2 GPU new: per-band decode + cupy.stack
Stripped 1 / 2 CPU unchanged
Stripped 1 / 2 GPU dim fix (mirrors PR #1525)

Test plan

  • pytest xrspatial/geotiff/tests/test_planar_multiband.py -- 53 passed
    (planar in {separate, contig} x layout in {tiled, stripped} x bands in {2,3,4}
    x dtype in {uint8, uint16} x backend in {CPU, GPU}, plus single-band sanity
    and an explicit A3 repro). Seeds fixed; GPU tests skip without CUDA.
  • pytest xrspatial/geotiff/tests/ -k "multi or band or planar or strip or tile" -- 303 passed
  • pytest xrspatial/geotiff/tests/test_predictor_multisample.py -- 33 passed
  • pytest xrspatial/geotiff/tests/test_gpu_strict_fallback_1516.py -- 5 passed
  • PR Fix read_geotiff_gpu dim mismatch on stripped multi-band TIFFs #1525's test_gpu_stripped_multiband.py cases verified manually
    against this branch (3-band uint8, 2-band uint16, single-band sanity)
  • Full pytest xrspatial/geotiff/tests/ -- 837 passed (3 pre-existing
    matplotlib-deepcopy failures in test_features.py unrelated to this change)

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 8, 2026
@brendancol brendancol requested a review from Copilot May 8, 2026 20:55
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes GeoTIFF multi-band read correctness for PlanarConfiguration=2 (separate planar) across GPU tiled reads, and aligns the GPU stripped fallback’s dimension handling with the CPU reader so multi-band stripped outputs consistently return (y, x, band).

Changes:

  • Add a new GPU tiled read branch for planar=2 that decodes per-band tile slabs and stacks them into (H, W, samples).
  • Fix GPU stripped fallback dims/coords so 3-D CPU reads return ('y', 'x', 'band') rather than forcing 2-D dims.
  • Add comprehensive tests covering planar configuration × layout × band count × dtype across CPU and GPU backends.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File Description
xrspatial/geotiff/__init__.py Adds planar=2 handling in read_geotiff_gpu, introduces a per-band tile decode helper, and fixes stripped fallback dims.
xrspatial/geotiff/tests/test_planar_multiband.py New test module validating CPU/GPU multi-band correctness and dims across planar/layout combinations.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread xrspatial/geotiff/__init__.py Outdated
Comment on lines +1400 to +1409
def _gpu_decode_single_band_tiles(
source, offsets, byte_counts,
tw, th, width, height,
compression, predictor, file_dtype,
*,
byte_order: str,
gpu: str,
overview_level,
cupy,
):
Comment thread xrspatial/geotiff/__init__.py Outdated
Comment thread xrspatial/geotiff/__init__.py
Comment thread xrspatial/geotiff/__init__.py Outdated
Comment thread xrspatial/geotiff/__init__.py Outdated
Comment thread xrspatial/geotiff/__init__.py Outdated
brendancol added a commit to brendancol/xarray-spatial that referenced this pull request May 9, 2026
Seven findings, all acted on:

- Sparse-tile bypass: planar=2 branch ran before the has_sparse_tile
  check, so a planar=2 file with sparse tiles would hit the GPU path
  that doesn't handle sparse blocks. Add `not has_sparse_tile` to the
  planar=2 gate so sparse always routes to the CPU reader.
- Strict offset-count check: planar=2 required len(offsets) ==
  tiles_per_band * samples, but validate_tile_layout only requires
  >=. Files with extra trailing entries that pass the CPU reader were
  rejected here. Relax to a minimum-length check and slice the first
  tiles_per_band * samples entries.
- Auto-mode fallback semantics: stage-2 GPU failure in
  _gpu_decode_single_band_tiles raised unconditionally instead of
  letting the caller fall back to a whole-image CPU read like the
  chunky path does. Helper now returns None on stage-2 auto failure;
  caller catches the signal, runs read_to_array + cupy.asarray, and
  matches the chunky path's gpu='auto' contract. Strict mode still
  re-raises.
- Per-band file re-reads: the helper called src.read_all() for each
  band's stage-2 fallback, doing N redundant full-file reads on a
  planar=2 image with N bands. Caller now does read_all() once before
  the band loop and passes the bytes via a new shared_data parameter.
- Unused params: drop overview_level and cupy from the helper
  signature; neither was referenced.
- Two assert statements for runtime shape validation replaced with
  explicit `if ...: raise RuntimeError(...)` so the checks survive
  python -O.

53 planar tests + 303 multi/band/planar/strip/tile/sparse regression
tests still pass.
brendancol added 3 commits May 9, 2026 06:05
Two related multi-band accuracy bugs from the geotiff audit:

A2 (HIGH): GPU tiled reads of planar=2 files (separate band planes)
returned wrong pixel values. gpu_decode_tiles iterates a single
TileOffsets list with bytes_per_pixel = itemsize * samples and assumes
chunky interleave. For planar=2, each band has its own contiguous run
of tiles in TileOffsets, so the kernel read across band boundaries and
produced garbage. read_geotiff_gpu now detects planar==2 and decodes
each band's tile slab as a single-band image, then stacks the results
into (H, W, samples). The chunky planar=1 path is unchanged.

A3 (MEDIUM): The GPU stripped multi-band fallback hardcoded
dims=['y','x'] even for 3-D CPU reads, mirroring the pre-xarray-contrib#1525 bug.
The same fix lands here so the planar=2 stripped tests pass; this
matches the change in PR xarray-contrib#1525 verbatim.

Both paths assert the assembled shape equals (H, W, samples) before
returning, so a future kernel regression surfaces immediately rather
than producing garbage.

Adds xrspatial/geotiff/tests/test_planar_multiband.py covering 53
combinations: planar in {separate, contig} x layout in {tiled, stripped}
x bands in {2, 3, 4} x dtype in {uint8, uint16} x backend in
{CPU, GPU}, plus single-band sanity checks and an explicit A3 repro.
GPU tests skip cleanly when CUDA is unavailable. Seeds are fixed.

Refs audit issues A2, A3.
Seven findings, all acted on:

- Sparse-tile bypass: planar=2 branch ran before the has_sparse_tile
  check, so a planar=2 file with sparse tiles would hit the GPU path
  that doesn't handle sparse blocks. Add `not has_sparse_tile` to the
  planar=2 gate so sparse always routes to the CPU reader.
- Strict offset-count check: planar=2 required len(offsets) ==
  tiles_per_band * samples, but validate_tile_layout only requires
  >=. Files with extra trailing entries that pass the CPU reader were
  rejected here. Relax to a minimum-length check and slice the first
  tiles_per_band * samples entries.
- Auto-mode fallback semantics: stage-2 GPU failure in
  _gpu_decode_single_band_tiles raised unconditionally instead of
  letting the caller fall back to a whole-image CPU read like the
  chunky path does. Helper now returns None on stage-2 auto failure;
  caller catches the signal, runs read_to_array + cupy.asarray, and
  matches the chunky path's gpu='auto' contract. Strict mode still
  re-raises.
- Per-band file re-reads: the helper called src.read_all() for each
  band's stage-2 fallback, doing N redundant full-file reads on a
  planar=2 image with N bands. Caller now does read_all() once before
  the band loop and passes the bytes via a new shared_data parameter.
- Unused params: drop overview_level and cupy from the helper
  signature; neither was referenced.
- Two assert statements for runtime shape validation replaced with
  explicit `if ...: raise RuntimeError(...)` so the checks survive
  python -O.

53 planar tests + 303 multi/band/planar/strip/tile/sparse regression
tests still pass.
The previous fix did read_all() unconditionally before the band loop
to amortise a stage-2 fallback across bands. When every band's GDS
path succeeds, those bytes are never used. Replace the eager
read_all() with a closure that materialises the buffer on first call
and caches it, so:

- Every-band-GDS-succeeds case: 0 read_all() calls (was 1).
- Any-band-falls-back case: exactly 1 read_all() call regardless of
  how many bands fall back (was N).

Same Copilot-finding-xarray-contrib#4 fix, just lazier.
@brendancol brendancol force-pushed the fix/planar-config-multiband branch from 67bb8c2 to 5560775 Compare May 9, 2026 13:06
@brendancol brendancol merged commit 862bea1 into xarray-contrib:main May 9, 2026
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants